Using Web Corpus Statistics to Infer Conceptual Structure

نویسندگان

  • Brandon M. Lock
  • Eugene Agichtein
  • Kevin J. Holmes
  • Phillip Wolff
چکیده

The basic level is the level of conceptual structure at which categories are maximally informative. In this research, we investigated whether the privileged status of the basic level might be captured by the statistical properties of the Web. Using Google’s Web search programming interface, we found that frequency ratios for terms across three levels of abstraction (superordinate, basic, and subordinate) significantly predicted human participants’ spontaneous labeling of images obtained via Mechanical Turk. Specifically, the Web statistics paralleled participants’ preference for superordinate labels for natural kinds (e.g., trees, fish) and basic-level labels for other categories. Further, analyses of genre-specific text from the Corpus of Contemporary American English revealed that children’s texts were significantly more predictive than academic texts. Our findings suggest that distributional statistics from subsets of the Web can be used to infer properties of conceptual structure, potentially offering a powerful, high-resolution, yet low-cost tool for empirically testing theoretical predictions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken to Spoken vs. Spoken to Written: Corpus Approach to Exploring Interpreting and Subtitling

issue of Polibits includes a selection of papers related to the topic of processing of semantic information. Processing of semantic information involves usage of methods and technologies that help machines to understand the meaning of information. These methods automatically perform analysis, extraction, generation, interpretation, and annotation of information contained on the Web, corpus, nat...

متن کامل

A NewAlgorithmic Identity Soft Biopolitics and the Modulation of Control

Marketing and web analytic companies have implemented sophisticated algorithms to observe, analyze, and identify users through large surveillance networks online. These computer algorithms have the capacity to infer categories of identity upon users based largely on their web-surfing habits. In this article I will first discuss the conceptual and theoretical work around code, outlining its use ...

متن کامل

How to Expand Dictionaries with Web-Mining Techniques

This paper presents an approach to enrich conceptual classes based on the Web. To test our approach, we first build conceptual classes using syntactic and semantic information provided by a corpus. The concepts can be the input of a dictionary. Our web-mining approach deals with a cognitive process which simulates human reasoning based on the enumeration principle. The experiments reveal the in...

متن کامل

How to Expand Dictionaries by Web-Mining Techniques

This paper presents an approach to enrich conceptual classes based on the Web. To test our approach, we first build conceptual classes using syntactic and semantic information provided by a corpus. The concepts can be the input of a dictionary. Our web-mining approach deals with a cognitive process which simulates human reasoning based on the enumeration principle. The experiments reveal the in...

متن کامل

Designing and Evaluating a Conceptual Model of Credibility Evaluation of Web Information: a Meta-synthesis and Delphi Study

Background and Aim: The current research aims to develop a literature-dependent and expert-modified model related to credibility evaluation of web information. Methods: Regarding the approach, mixed method would be utilized. The research method then is mixed-heuristic using both qualitative and quantitative methodologies. In the first stage of the research, meta- synthesis was used as a qualita...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011